Ames Real Estate Machine Learning Project

Yi Cao, Daniel Choy, Ling Ge Zeng

Objective: The first part of this project explores the characteristics of Ames housing market. The second part of this project includes a comparison of different regression models and assess their test performances

Overview of Ames Housing Market

Ames House Sale Price Distribution

Text(0, 0.5, 'Density')

The distribution graph indicates that Ames housing market has more outliers on the expensive side

Ames Housing Market Trend

Text(0, 0.5, 'Sale Price per Square Feet')

Ames housing market is relatively stable in terms of per square foot pricing over the years

Ames Housing Market Overview by Price

Make this Notebook Trusted to load map: File -> Trust Notebook

Ames Housing Market Overview by Price per Square Feet

Make this Notebook Trusted to load map: File -> Trust Notebook

Building Type in Ames

Make this Notebook Trusted to load map: File -> Trust Notebook

Break it down into different categories

1. Neighborhood

2. House Size

3. House Age

4. House Features

5. Other Features

Neighborhood

Sale Price per Square Feet by Neighborhood

Make this Notebook Trusted to load map: File -> Trust Notebook

Most expensive houses are concentrated in northern part of Ames

Distance from Iowa State University vs Sale Price

Text(0, 0.5, 'Sale Price per Square Feet')

Distance from Iowa State University vs Sale Price

<matplotlib.axes._subplots.AxesSubplot at 0x7fbee06ba2d0>

House Size

Total Living Area vs Sale Price

<matplotlib.axes._subplots.AxesSubplot at 0x7fc1608bb0d0>

Larger living area leads to higher sale price; in addition, the impact of living area on sale price wanes off as the living area continues to increase.

Living Area vs Sale Price

/Users/danielchoy/opt/anaconda3/lib/python3.7/site-packages/numpy/core/function_base.py:153: RuntimeWarning: invalid value encountered in multiply
  y *= step
/Users/danielchoy/opt/anaconda3/lib/python3.7/site-packages/numpy/core/function_base.py:163: RuntimeWarning: invalid value encountered in add
  y += start
/Users/danielchoy/opt/anaconda3/lib/python3.7/site-packages/numpy/linalg/linalg.py:1965: RuntimeWarning: invalid value encountered in greater
  large = s > cutoff
<matplotlib.axes._subplots.AxesSubplot at 0x7fbeb991e650>

House Age

Age Characteristics of Houses at Ames

Make this Notebook Trusted to load map: File -> Trust Notebook

Effective House Age vs Sale Price

(1, 62)

Effective house age is calculated as Year Sold - Year Remodeled. The lower the effective age, the more expensive houses are. This indicates that there is an investment opportunity for home flippers to make profit in Ames.

Actual Age vs Sale Price

Text(0, 0.5, 'Sale Price')

Actual age is calculated as Year Sold - Year Built. If we exclude the outlier, newer houses have a higher selling price.

Average House Age by Neighborhood

Make this Notebook Trusted to load map: File -> Trust Notebook

House Features

Basement

<matplotlib.axes._subplots.AxesSubplot at 0x7fbee05eb050>

Kitchen

Text(0, 0.5, 'Sale Price')
  • Higher kitchen quality could increase house price
  • Price has no significant difference in number of kitchen with same quality.

Garage

<matplotlib.axes._subplots.AxesSubplot at 0x7fbef034e250>
  • Garage with good and excellent quality, they do have paved drive way and higher house price
  • Paved drive way has higher house price compare to no paved for the same garage quality

Bathroom

Text(0, 0.5, 'Sale Price')

The graph shows that there is a relationship between number of bathrooms and sale price

Other Interior Features:

Electricity, Heating, Central Air, and Fireplace

<matplotlib.axes._subplots.AxesSubplot at 0x7fbf1128ad10>
  • Having better heating quality is related to higher sale price
  • Having central air is related to higher sale price
  • Having more and having better quality of fire place is related to higher sale price

Other Exterior Features

Land Slope, Building Type, Porch, Roof Style

/Users/danielchoy/opt/anaconda3/lib/python3.7/site-packages/pandas/core/series.py:679: RuntimeWarning: divide by zero encountered in log
  result = getattr(ufunc, method)(*inputs, **kwargs)
<matplotlib.axes._subplots.AxesSubplot at 0x7fbf05063390>

Exterior Condition and Quality

<matplotlib.axes._subplots.AxesSubplot at 0x7fbed5736ad0>

Feature Engineering

  • Building Age
  • Remodeled
  • IsPUD
  • LotIsReg
  • HillORDepr
  • PosFeat
  • Extmatl
  • TotalPorchSF
  • HasFence
  • Funct_3
  • Building Age
  • Remodeled
  • IsPUD
    20  1-STORY 1946 & NEWER ALL STYLES
    30  1-STORY 1945 & OLDER
    40  1-STORY W/FINISHED ATTIC ALL AGES
    45  1-1/2 STORY - UNFINISHED ALL AGES
    50  1-1/2 STORY FINISHED ALL AGES
    60  2-STORY 1946 & NEWER
    70  2-STORY 1945 & OLDER
    75  2-1/2 STORY ALL AGES
    80  SPLIT OR MULTI-LEVEL
    85  SPLIT FOYER
    90  DUPLEX - ALL STYLES AND AGES
   120  1-STORY PUD (Planned Unit Development) - 1946 & NEWER
   150  1-1/2 STORY PUD - ALL AGES
   160  2-STORY PUD - 1946 & NEWER
   180  PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
   190  2 FAMILY CONVERSION - ALL STYLES AND AGES
  • LotIsReg
  • HillORDepr
  • PosFeat
  • HasFence
  • Extmatl
  • TotalPorchSF
  • Funct_3
   Typ  Typical Functionality
   Min1 Minor Deductions 1
   Min2 Minor Deductions 2
   Mod  Moderate Deductions
   Maj1 Major Deductions 1
   Maj2 Major Deductions 2
   Sev  Severely Damaged
   Sal  Salvage only
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.917
Model:                            OLS   Adj. R-squared:                  0.913
Method:                 Least Squares   F-statistic:                     279.5
Date:                Mon, 30 Nov 2020   Prob (F-statistic):               0.00
Time:                        11:58:04   Log-Likelihood:                 1762.8
No. Observations:                2243   AIC:                            -3354.
Df Residuals:                    2157   BIC:                            -2862.
Df Model:                          85                                         
Covariance Type:            nonrobust                                         
=========================================================================================
                            coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------
const                     8.2923      0.104     79.768      0.000       8.088       8.496
Distance                 -0.0149      0.003     -4.367      0.000      -0.022      -0.008
Alley                    -0.0089      0.012     -0.776      0.438      -0.032       0.014
OverallQual               0.0668      0.003     20.231      0.000       0.060       0.073
OverallCond               0.0395      0.003     14.028      0.000       0.034       0.045
TotRmsAbvGrd              0.0314      0.003     12.191      0.000       0.026       0.036
Fireplaces                0.0455      0.008      5.865      0.000       0.030       0.061
GarageArea                0.0002   1.85e-05     12.121      0.000       0.000       0.000
MoSold                   -0.0006      0.001     -0.688      0.491      -0.002       0.001
MasVnrArea2               0.0121      0.006      2.037      0.042       0.000       0.024
total_LivArea             0.2857      0.013     22.525      0.000       0.261       0.311
num_bathroom              0.0199      0.005      3.846      0.000       0.010       0.030
BldgAge                  -0.0012      0.000     -4.977      0.000      -0.002      -0.001
Remodeled                -0.0108      0.006     -1.804      0.071      -0.023       0.001
IsPUD                     0.0143      0.052      0.277      0.782      -0.087       0.116
LotIsReg                 -0.0128      0.006     -2.314      0.021      -0.024      -0.002
HillORDepr                0.0504      0.011      4.548      0.000       0.029       0.072
PosFeat                   0.0194      0.017      1.174      0.240      -0.013       0.052
BsmtQual_num              0.0151      0.003      5.176      0.000       0.009       0.021
KitchenQual_num           0.0151      0.003      5.465      0.000       0.010       0.021
FireplaceQu_num           0.0020      0.001      1.404      0.161      -0.001       0.005
GarageQual_num            0.0019      0.006      0.326      0.744      -0.009       0.013
BsmtCond_num             -0.0009      0.004     -0.263      0.793      -0.008       0.006
GarageCond_num            0.0063      0.006      1.013      0.311      -0.006       0.018
HeatingQC_num             0.0065      0.002      3.884      0.000       0.003       0.010
TotalPorchSF              0.0035      0.001      2.524      0.012       0.001       0.006
HasFence                 -0.0086      0.006     -1.332      0.183      -0.021       0.004
MSZoning_FV               0.2530      0.037      6.924      0.000       0.181       0.325
MSZoning_I (all)          0.2445      0.124      1.970      0.049       0.001       0.488
MSZoning_RH               0.1519      0.045      3.401      0.001       0.064       0.240
MSZoning_RL               0.1978      0.033      5.966      0.000       0.133       0.263
MSZoning_RM               0.1382      0.033      4.194      0.000       0.074       0.203
BldgType_2fmCon          -0.0533      0.019     -2.822      0.005      -0.090      -0.016
BldgType_Duplex          -0.1048      0.017     -6.217      0.000      -0.138      -0.072
BldgType_Twnhs           -0.1277      0.053     -2.393      0.017      -0.232      -0.023
BldgType_TwnhsE          -0.0787      0.052     -1.514      0.130      -0.181       0.023
HouseStyle_1.5Unf         0.0225      0.028      0.806      0.421      -0.032       0.077
HouseStyle_1Story         0.0184      0.010      1.849      0.065      -0.001       0.038
HouseStyle_2.5Fin        -0.0717      0.049     -1.473      0.141      -0.167       0.024
HouseStyle_2.5Unf         0.0048      0.029      0.167      0.868      -0.052       0.061
HouseStyle_2Story        -0.0349      0.010     -3.466      0.001      -0.055      -0.015
HouseStyle_SFoyer        -0.0386      0.020     -1.974      0.048      -0.077      -0.000
HouseStyle_SLvl          -0.0544      0.015     -3.542      0.000      -0.084      -0.024
Foundation_CBlock         0.0117      0.011      1.087      0.277      -0.009       0.033
Foundation_PConc          0.0309      0.012      2.519      0.012       0.007       0.055
Foundation_Slab           0.0067      0.030      0.227      0.820      -0.051       0.065
Foundation_Stone          0.0419      0.042      0.994      0.320      -0.041       0.124
Foundation_Wood           0.0632      0.059      1.078      0.281      -0.052       0.178
BsmtExposure_Gd           0.0407      0.011      3.762      0.000       0.019       0.062
BsmtExposure_Mn          -0.0253      0.011     -2.281      0.023      -0.047      -0.004
BsmtExposure_No          -0.0213      0.008     -2.612      0.009      -0.037      -0.005
CentralAir_Y              0.0332      0.013      2.493      0.013       0.007       0.059
Electrical_FuseF         -0.0363      0.022     -1.622      0.105      -0.080       0.008
Electrical_FuseP         -0.0007      0.046     -0.015      0.988      -0.092       0.090
Electrical_SBrkr         -0.0172      0.010     -1.649      0.099      -0.038       0.003
GarageType_Attchd         0.0436      0.028      1.564      0.118      -0.011       0.098
GarageType_Basment        0.0333      0.037      0.901      0.368      -0.039       0.106
GarageType_BuiltIn        0.0358      0.030      1.201      0.230      -0.023       0.094
GarageType_CarPort       -0.0040      0.051     -0.078      0.938      -0.105       0.097
GarageType_Detchd         0.0284      0.028      1.026      0.305      -0.026       0.083
GarageType_None          -0.0285      0.122     -0.233      0.816      -0.269       0.212
GarageFinish_None         0.1134      0.125      0.909      0.364      -0.131       0.358
GarageFinish_RFn          0.0020      0.007      0.281      0.779      -0.012       0.016
GarageFinish_Unf          0.0065      0.009      0.760      0.448      -0.010       0.023
PavedDrive_P              0.0078      0.019      0.419      0.675      -0.029       0.044
PavedDrive_Y              0.0340      0.012      2.865      0.004       0.011       0.057
SaleCondition_AdjLand     0.0589      0.084      0.699      0.485      -0.106       0.224
SaleCondition_Alloca      0.1287      0.069      1.863      0.063      -0.007       0.264
SaleCondition_Family     -0.0854      0.033     -2.592      0.010      -0.150      -0.021
SaleCondition_Normal      0.0580      0.016      3.531      0.000       0.026       0.090
SaleCondition_Partial     0.1109      0.021      5.189      0.000       0.069       0.153
SchD_S_5                  0.0336      0.009      3.541      0.000       0.015       0.052
ExtMatl_AsphShn           0.0838      0.119      0.705      0.481      -0.149       0.317
ExtMatl_BrkFace           0.0890      0.030      2.973      0.003       0.030       0.148
ExtMatl_CBlock           -0.1347      0.117     -1.155      0.248      -0.363       0.094
ExtMatl_HdBoard           0.0066      0.025      0.268      0.789      -0.042       0.055
ExtMatl_ImStucc           0.0347      0.116      0.300      0.764      -0.192       0.262
ExtMatl_MetalSd           0.0380      0.024      1.574      0.116      -0.009       0.085
ExtMatl_Mixed             0.0280      0.024      1.152      0.249      -0.020       0.076
ExtMatl_Plywood           0.0028      0.026      0.107      0.915      -0.048       0.053
ExtMatl_PreCast           0.4219      0.117      3.599      0.000       0.192       0.652
ExtMatl_Stucco            0.0357      0.033      1.088      0.277      -0.029       0.100
ExtMatl_VinylSd           0.0192      0.024      0.789      0.430      -0.029       0.067
ExtMatl_Wd Sdng           0.0130      0.024      0.541      0.589      -0.034       0.060
Funct_3_ModToSev         -0.0550      0.020     -2.753      0.006      -0.094      -0.016
Funct_3_Normal            0.0178      0.012      1.485      0.138      -0.006       0.041
==============================================================================
Omnibus:                      490.653   Durbin-Watson:                   2.068
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             4843.044
Skew:                          -0.741   Prob(JB):                         0.00
Kurtosis:                      10.045   Cond. No.                     3.73e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.73e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
const                     0.000000e+00
Distance                  1.317128e-05
OverallQual               1.785328e-83
OverallCond               7.581546e-43
TotRmsAbvGrd              4.128864e-33
Fireplaces                5.174094e-09
GarageArea                9.223537e-33
MasVnrArea2               4.180283e-02
total_LivArea            4.409789e-101
num_bathroom              1.234449e-04
BldgAge                   6.952364e-07
LotIsReg                  2.074494e-02
HillORDepr                5.706978e-06
BsmtQual_num              2.479304e-07
KitchenQual_num           5.174364e-08
HeatingQC_num             1.056605e-04
TotalPorchSF              1.168088e-02
MSZoning_FV               5.777838e-12
MSZoning_I (all)          4.897592e-02
MSZoning_RH               6.832804e-04
MSZoning_RL               2.834646e-09
MSZoning_RM               2.851459e-05
BldgType_2fmCon           4.820073e-03
BldgType_Duplex           6.061650e-10
BldgType_Twnhs            1.679243e-02
HouseStyle_2Story         5.376374e-04
HouseStyle_SFoyer         4.848411e-02
HouseStyle_SLvl           4.048050e-04
Foundation_PConc          1.182523e-02
BsmtExposure_Gd           1.732428e-04
BsmtExposure_Mn           2.262584e-02
BsmtExposure_No           9.074225e-03
CentralAir_Y              1.274814e-02
PavedDrive_Y              4.209032e-03
SaleCondition_Family      9.598061e-03
SaleCondition_Normal      4.233470e-04
SaleCondition_Partial     2.306671e-07
SchD_S_5                  4.075254e-04
ExtMatl_BrkFace           2.983124e-03
ExtMatl_PreCast           3.265755e-04
Funct_3_ModToSev          5.948547e-03
dtype: float64
print(lin_reg.score(X_train, y_train))
print(lin_reg.score(X_test, y_test))
0.9167675316925197
0.8553809707773833

Decision Tree

R^2 of Train set: 0.9369999792245037
R^2 Test set: 0.7704724966752604
          feature  importance
0   b'OverallQua'    0.574323
1   b'total_LivA'    0.278932
2   b'num_bathro'    0.031961
3   b'GarageArea'    0.024335
4   b'MSZoning_R'    0.013634
5   b'Fireplaces'    0.010243
6      b'BldgAge'    0.007780
7   b'OverallCon'    0.006985
8     b'Distance'    0.006923
9   b'BsmtQual_n'    0.006482
10  b'TotalPorch'    0.005796
11  b'HeatingQC_'    0.003707
12  b'SaleCondit'    0.002729
13  b'SaleCondit'    0.002523
14  b'TotRmsAbvG'    0.002211
15      b'MoSold'    0.001882
16  b'Funct_3_No'    0.001808
17  b'HouseStyle'    0.001675
18  b'FireplaceQ'    0.001630
19  b'PavedDrive'    0.001618
CPU times: user 3.56 s, sys: 624 ms, total: 4.18 s
Wall time: 6.97 s
Grid Search Best Parameters: {'criterion': 'mse', 'min_samples_leaf': 8, 'min_samples_split': 28}
Grid Search Best Scores: 0.8268176932421467
Grid Search R2 of Train set: 0.8983554977590436
Grid Search R2 of Test set: 0.797853693953499
<Figure size 1224x720 with 0 Axes>

Random Forest Model

R^2 of Train set: 0.9832981784794597
R^2 Test set: 0.8211597655778335
          feature  importance
0   b'total_LivA'    0.128920
1   b'OverallQua'    0.087664
2   b'GarageArea'    0.070480
3      b'BldgAge'    0.065392
4   b'num_bathro'    0.055254
5   b'BsmtQual_n'    0.049037
6   b'KitchenQua'    0.045427
7   b'FireplaceQ'    0.036407
8   b'TotRmsAbvG'    0.034690
9   b'Fireplaces'    0.034439
10  b'TotalPorch'    0.032197
11  b'Foundation'    0.029651
12  b'HeatingQC_'    0.020817
13    b'Distance'    0.020533
14  b'GarageFini'    0.018557
15  b'OverallCon'    0.017835
16  b'GarageType'    0.017016
17  b'MSZoning_R'    0.013469
18  b'MasVnrArea'    0.013318
19  b'GarageType'    0.012310
CPU times: user 29.3 s, sys: 1.7 s, total: 31 s
Wall time: 1min 30s
Grid Search Best Parameters: {'criterion': 'mse', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100, 'random_state': 42}
Grid Search Best Scores: 0.8792817295203168
Grid Search R2 of Train set: 0.9830955549029718
Grid Search R2 of Test set: 0.821394150770663

Gradient Boosting Model

R^2 of Train set: 0.17731437608580825
R^2 Test set: 0.15184788172704156
                    0         1
0         OverallQual  0.383076
1       total_LivArea  0.359307
2          GarageArea  0.054201
3     FireplaceQu_num  0.032072
4             BldgAge  0.030127
5     KitchenQual_num  0.024112
6         OverallCond  0.018578
7        num_bathroom  0.011860
8         MSZoning_RM  0.011799
9        CentralAir_Y  0.011017
10         Fireplaces  0.007108
11       TotRmsAbvGrd  0.006971
12       TotalPorchSF  0.006776
13        MSZoning_RL  0.006041
14  GarageType_Attchd  0.004861
15       BsmtQual_num  0.004315
16     GarageCond_num  0.004168
17           Distance  0.004092
18      HeatingQC_num  0.004044
19       PavedDrive_Y  0.002125
Text(0.5, 1.0, 'Feature Importance Plot of 1000-Tree GBM')

Lasso Regression

3.158590218274496e-05
feature coef
2 OverallQual 0.068616
3 OverallCond 0.036761
4 TotRmsAbvGrd 0.028648
5 Fireplaces 0.043930
6 GarageArea 0.000215
8 MasVnrArea2 0.007217
9 total_LivArea 0.290148
10 num_bathroom 0.018246
15 HillORDepr 0.048925
16 PosFeat 0.016757
17 ExterQual_num 0.009755
18 BsmtQual_num 0.012905
19 KitchenQual_num 0.013091
20 FireplaceQu_num 0.002377
24 GarageCond_num 0.000307
25 HeatingQC_num 0.005984
26 TotalPorchSF 0.002698
28 MSZoning_FV 0.101146
29 MSZoning_I (all) 0.002829
31 MSZoning_RL 0.063494
37 HouseStyle_1.5Unf 0.000464
38 HouseStyle_1Story 0.024071
45 Foundation_PConc 0.016294
48 Foundation_Wood 0.015804
49 BsmtExposure_Gd 0.041441
52 CentralAir_Y 0.045717
56 GarageType_Attchd 0.013846
66 PavedDrive_Y 0.036055
68 SaleCondition_Alloca 0.077351
70 SaleCondition_Normal 0.039659
71 SaleCondition_Partial 0.087334
72 SchD_S_5 0.026390
73 ExtMatl_AsphShn 0.001317
74 ExtMatl_BrkFace 0.050218
78 ExtMatl_MetalSd 0.013079
79 ExtMatl_Mixed 0.003917
81 ExtMatl_PreCast 0.353994
86 Funct_3_Normal 0.016080

Conclusions and Recommendations

  • Linear regression model may be better at predicting Ames housing prices
  • Housing price at Ames, Iowa is stable from 2006 to 2010
  • We recommend that home buyers focus on the overall quality (overall material and finish) of the house first and foremost in order to maximize profit